616 research outputs found

    Hyphenation : from transformer models and word embeddings to a new linguistic rule-set

    Get PDF
    Modern language models, especially those based on deep neural networks, frequently use bottom-up vocabulary generation techniques like Byte Pair Encoding (BPE) to create word pieces enabling them to model any sequence of text, even with a fixed-size vocabulary significantly smaller than the full training vocabulary. The resulting language models often prove extremely capable. Yet, when included into traditional Automatic Speech Recognition (ASR) pipelines, these languages models can sometimes perform quite unsatisfyingly for rare or unseen text, because the resulting word pieces often don’t map cleanly to phoneme sequences (consider for instance Multilingual BERT’s unfortunate breaking of Sonnenlicht into Sonne+nl+icht). This impairs the ability for the acoustic model to generate the required token sequences, preventing good options from being considered in the first place. While approaches like Morfessor attempt to solve this problem using more refined algorithms, these approaches only make use of the written form of a word as an input, splitting words into parts disregarding the word’s actual meaning. Meanwhile, word embeddings for languages like Dutch have become extremely common and high-quality; in this project, the question of whether this knowledge about a word usage in context could be leveraged to yield better hyphenation quality will be investigated. For this purpose, the following approach is evaluated: A baseline Transformer model is tasked to generate hyphenation candidates for a given word based on its written form, and those candidates are subsequently reranked based on the embedding of the hyphenated word. The obtained results will be compared with the results yielded by Morfessor based on the same dataset. Finally, a new set of linguistic rules to perform Dutch hyphenation (suitable for use with Liang’s hyphenation algorithm from TEX82) will be presented. The resulting output of these rules will be compared to currently available rule-sets

    La connaissance de soi chez Épictète et Marc-Aurèle

    Get PDF
    Cet article se veut une exploration du thème de la connaissance de soi chez les philosophes stoïciens Épictète et Marc-Aurèle. À la lumière de la définition socratique du gnothi seauton (connais-toi toi-même), nous proposons d’examiner la « philosophie du soi » qu’Épictète et Marc-Aurèle ont su développer. Plus spécifiquement, nous souhaitons expliciter la célèbre distinction qu’effectue Épictète dans son Manuel (et qui sera reprise par Marc-Aurèle dans ses Pensées pour moi-même) entre ce qui dépend de nous (jugements, tendances, désirs, aversions, etc.) et ce qui ne dépend pas de nous (le corps, la célébrité, la richesse, le pouvoir). Dans la perspective stoïcienne qui est celle d’Épictète et de Marc-Aurèle, nous chercherons à démontrer que « se connaître soi-même » signifie être capable d’identifier ce qui dépend de notre juridiction, et qui dès lors n’est pas soumis au Destin

    Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge

    Full text link
    Background: More than 400,000 biomedical concepts and some of their relationships are contained in SnomedCT, a comprehensive biomedical ontology. However, their concept names are not always readily interpretable by non-experts, or patients looking at their own electronic health records (EHR). Clear definitions or descriptions in understandable language are often not available. Therefore, generating human-readable definitions for biomedical concepts might help make the information they encode more accessible and understandable to a wider public. Objective: In this article, we introduce the Automatic Glossary of Clinical Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts generated using high-quality information extracted from the biomedical knowledge contained in SnomedCT. Methods: We generate a novel definition for every SnomedCT concept, after prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality verbalization of the SnomedCT relationships of the to-be-defined concept. A significant subset of the generated definitions was subsequently judged by NLP researchers with biomedical expertise on 5-point scales along the following three axes: factuality, insight, and fluency. Results: AGCT contains 422,070 computer-generated definitions for SnomedCT concepts, covering various domains such as diseases, procedures, drugs, and anatomy. The average length of the definitions is 49 words. The definitions were assigned average scores of over 4.5 out of 5 on all three axes, indicating a majority of factual, insightful, and fluent definitions. Conclusion: AGCT is a novel and valuable resource for biomedical tasks that require human-readable definitions for SnomedCT concepts. It can also serve as a base for developing robust biomedical retrieval models or other applications that leverage natural language understanding of biomedical knowledge.Comment: Accepted at the BioNLP 2023 worksho

    Assessment of field rolling resistance of manual wheelchairs

    Get PDF
    This article proposes a simple and convenient method for assessing the subject-specific rolling resistance acting on a manual wheelchair, which could be used during the provision of clinical service. This method, based on a simple mathematical equation, is sensitive to both the total mass and its fore-aft distribution, which changes with the subject, wheelchair properties, and adjustments. The rolling resistance properties of three types of front casters and four types of rear wheels were determined for two indoor surfaces commonly encountered by wheelchair users (a hard smooth surface and carpet) from measurements of a three-dimensional accelerometer during field deceleration tests performed with artificial load. The average results provided by these experiments were then used as input data to assess the rolling resistance from the mathematical equation with an acceptable accuracy on hard smooth and carpet surfaces (standard errors of the estimates were 4.4 and 3.9 N, respectively). Thus, this method can be confidently used by clinicians to help users make trade-offs between front and rear wheel types and sizes when choosing and adjusting their manual wheelchair.This material was based on work supported by the SACR-FRM project, French National Research Agency (ANR-06-TecSan-020) and the Centre d’Etudeset de Recherche sur l’Appareillage des Handicapés (loaned all MWCs required to fulfill this work

    La réaction de Gorgias au Poème de Parménide : élaboration d'une rhétorique comme fuite de l'ontologie

    Full text link
    L’objectif principal de ce mémoire de maîtrise sera d’examiner en détails les arguments que Gorgias avance dans le Traité sur le non-être pour supporter sa thèse de l’impossibilité de la connaissance. Ces arguments sont au nombre de trois : a) rien n’est; b) Si quelque chose est, c’est inconnaissable; c) Si c’est connaissable, c’est indémontrable aux autres. En plus de s’attaquer à la thèse parménidienne de la correspondance entre le « penser » (noein) et « l’être » (einai), ces arguments viennent justifier l’art rhétorique. En effet, sans la connaissance qui nous permettrait de départager le vrai du faux, l’être humain n’a plus rien d’autre que ses intérêts personnels et les moyens rhétoriques de les faire triompher. Dans un tel contexte, la rhétorique devient, à proprement parler, la seule science véritablement légitime.The main objective of this master’s thesis is to closely examine the arguments that Gorgias develops in his Discourse on the not-being, in which he defends the impossibility of knowledge. There are three arguments in this discourse: a) nothing exists; b) even if something exists, nothing can be known about it; c) even if something can be known about it, knowledge about it can't be communicated to others. These arguments do not only attack the parmenidean’s thesis of correspondence (between mind and being); they also present themselves as a justification of the art of rhetoric. As a matter of fact, without knowledge, we are unable to establish a strong difference between truth and falsehood, and therefore we are left alone with our desires and interests. It is in such a context that rhetoric becomes, literally, the only legitimate art (or technique)

    Neural Posterior Estimation with Differentiable Simulators

    Full text link
    Simulation-Based Inference (SBI) is a promising Bayesian inference framework that alleviates the need for analytic likelihoods to estimate posterior distributions. Recent advances using neural density estimators in SBI algorithms have demonstrated the ability to achieve high-fidelity posteriors, at the expense of a large number of simulations ; which makes their application potentially very time-consuming when using complex physical simulations. In this work we focus on boosting the sample-efficiency of posterior density estimation using the gradients of the simulator. We present a new method to perform Neural Posterior Estimation (NPE) with a differentiable simulator. We demonstrate how gradient information helps constrain the shape of the posterior and improves sample-efficiency.Comment: Accepted at the ICML 2022 Workshop on Machine Learning for Astrophysic
    • …
    corecore